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Abstract 

■ We consider a classical problem in choice theory - vote aggregation - using novel distance measures between permutations 

that arise in several practical applications. The distance measures are derived through an axiomatic approach, taking into account 
various issues arising in voting with side constraints. The side constraints of interest include non-uniform relevance of the top and 
^s^j . the bottom of rankings (or equivalently, eliminating negative outliers in votes) and similarities between candidates (or equivalently, 

introducing diversity in the voting process). The proposed distance functions may be seen as weighted versions of the Kendall r 
distance and weighted versions of the Cayley distance. In addition to proposing the distance measures and providing the theoretical 
underpinnings for their applications, we also consider algorithmic aspects associated with distance-based aggregation processes. 
We focus on two methods. One method is based on approximating weighted distance measures by a generalized version of 
Spearman's footrule distance, and it has provable constant approximation guarantees. The second class of algorithms is based on a 
non-uniform Markov chain method inspired by PageRank, for which currently only heuristic guarantees are known. We illustrate 
the performance of the proposed algorithms for a number of distance measures for which the optimal solution may be easily 
^ I . computed. 

O' 

^ I I. Introduction 

Rank aggregation, sometimes referred to as ordinal data fusion, is a classical problem frequently encountered in the social 
sciences, web search and Internet service studies, expert opinion analysis, and economics 0, |[8), ifTTl . Il20l . (24 1, [25 1. Rank 
aggregation plays a special role in information retrieval based on different search models, in cases when users initiate several 
queries for the information of interest to them, or in situations when one has to combine various sources of evidence or use 
different document surrogates IfTTl . 

The problem can be succinctly described as follows: a set of "voters" or "experts" is presented with a set of candidates 
t— I " (objects, individuals, movies, etc.). Each voter's task is to produce a ranking, that is, an arrangement of the candidates in which 
. the candidates are ranked from the most preferred to the least preferred. The voters' rankings are then passed to an aggregator. 
i—{ The aggregator outputs a single ranking, termed the aggregate ranking, to be used as a representative of all voters. 
£N| ■ Rank aggregation for votes including two candidates reduces to a simple majority count. The situation becomes significantly 
7—1 1 more complex when three or more candidates are considered. Two of the most obvious extensions of vote aggregation for 
J> . two candidates to the case of more than two candidates are the majority rule and the Condorcet method (pairwise majority 
j - count). In the first case, one reduces the problem to counting how many times each candidate ended up at the top of the list. 
rS , This candidate is declared the winner, and removed from all rankings. The same process is then performed to identify the 
" second, etc., candidate in the list. In the second case, one aims at identifying the majority winner of pairwise competitions. 
Unfortunately, both methods are plagued by a number of issues that have cast doubt on the plausibility of fair vote aggregation. 
Examples include the famous Condorcet paradox [5], where pairwise comparisons may lead to intransitive results (i.e., for 
example, a may be preferred to b, b to c, and c to a). 

To mitigate such problems, two other important categories of rank aggregation methods were studied in the past. These 
include score-based methods and distance-based methods. In score-based methods, the first variant of which was proposed 
by Borda ]4), each candidate is assigned a score based on its position in each of the votes (rankings). The candidates are 
then ranked based on their total score. One argument in support of using Borda's count method is that it ranks highly those 
candidates supported at least to a certain extent by almost all voters, rather than candidates who are ranked highly only by the 
simple majority of voters. In distance-based methods [17], the aggregate is the deemed to be the ranking "closest" to the set of 
votes, or at the smallest cumulative distance from the votes, where closeness of two rankings is measured via some adequately 
chosen distance function. This approach can be thought of as finding the center of mass of the rankings, or the median - 
centroid - of the rankings, with the rankings representing point masses in a metric space. Well-known distance measures for 
rank aggregation include the Kendall r, the Cayley distance, and Spearman's Footrule Q. 

Clearly, the most important aspect of distance-based rank aggregation is to choose an appropriate distance function. One 
may argue that almost all problems arising in connection with the majority method or score based approaches directly translate 
into problems concerning the chosen distance measures. To address this issue, Kemeny lfl7l . fl8 1 presented a set of intuitively 
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justifiable axioms that a distance measure must satisfy to be deemed suitable for aggregation purposes, and showed that only 
one distance measure satisfies the axioms - namely, the Kendall r distance. The Kendall r distance between two rankings 
is the smallest number of swaps of adjacent elements that transforms one ranking into the other. For example, the Kendall 
r distance between the rankings (1,3,4,2) and (1,2,3,4) is two; we may first swap 2 and 4 and then 2 and 3 to transform 
(1, 3, 4, 2) to (1, 2, 3, 4). Besides its use in social choice and computer science theory, the Kendall r distance has also received 
significant attention in the coding theory literature, due to its applications in modulation coding for flash memories fl6l . fl9l , 
ED- 

Unfortunately, the Kendall r is not a suitable distance measure for aggregation problems involving various electoral and 
Internet search engine constraints. Two such important constraints include differentiating the significance of the top versus the 
bottom of a ranking and differentiating candidates based on their "similarity". 

In the first example, consider the following scenario. One may view the process of forming the aggregate ranking as one 
of "tweaking" a starting ranking so as to make it as close as possible to all given voters' rankings. The effect of changing 
the ordering of candidates at the top or at the bottom is in principle the same - i.e., if switching the top two elements in 
the aggregate reduces the total distance from the votes by the same amount as switching the bottom two elements, then both 
options are equally valid to be used. But in many applications, changes at, or near to, the top of rankings should not affect the 
distance between rankings to same extent as changes at, or near to, the bottom of rankings. In other words, one should penalize 
making changes at the top of the list more than making changes at the bottom of the list, given that low ranked items are 
usually not very relevant. So far, only a handful of results are known for rank aggregation distances that address the problem 
of positional relevance, i.e., the significance of the top versus the bottom of rankings. One approach was described in ||20l , 
where the proposed distances were based on heuristic arguments only. These approaches do not have axiomatic underpinnings, 
and efficient aggregation algorithms to accompany them are not known. 

In the second example, consider a voting process were candidates should be ranked both based on merit and on a diversity 
criteria - for example, not having more than two of the top ten candidates working in information theory. One may argue that 
in this case, using the Kendall r distance for aggregation and reshuffling some candidates in order to satisfy the constraints, 
suffices to solve the problem. For example, one may move all except the two highest-ranked information theorists below 
position ten and leave the ranking unchanged otherwise. It is clear that this procedure may not be viewed as fair, since ranks 
of all candidates were affected by the rankings of information theorists in the first place. Alternatively, one may reduce the 
search space only to rankings that satisfy the constraints, but this approach is computationally highly challenging. 

Henceforth, we focus our attention on distance based aggregation methods catering to constraints of the form described 
above. The goal of our work is to provide an axiomatic underpinning for novel distance measures between rankings that take 
into account predetermined top-bottom and similarity/diversity constraints. In addition to their applications in computer science 
and social choice theory, these distance measures may be used in a variety of applications, ranging from bioinformatics to 
network analysis lfl2l . 

Motivation - Top vs. Bottom 

Consider the ranking tt of the "World's 10 best cities to live in", according to a report composed by the Economist Intelligence 
Unit iflOl : 

tt = (Melbourne, Vienna, Vancouver, Toronto, Calgary, 
Adelaide, Sydney, Helsinki, Perth, Auckland) 

Now consider two other rankings that both differ from tt by one swap of adjacent entries: 

7r' = (Melbourne, Vienna, Vancouver, Calgary, Toronto, 

Adelaide, Sydney, Helsinki, Perth, Auckland), 
tt" = (Vienna, Melbourne, Vancouver, Toronto, Calgary, 

Adelaide, Sydney, Helsinki, Perth, Auckland). 

The astute reader probably immediately noticed that the top candidate was changed in tt", but otherwise took some time to 
realize where the adjacent swap appeared in tt'. This is a consequence of the well-known fact that humans pay more attention 
to the top of the list rather than any other location in the ranking, and hence notice changes in higher positions easieiQ. Note 
that the Kendall r distance between tt and n' and between tt and tt" is one, but it would appear reasonable to assume that the 
distance between tt and tt" be larger than that between tt and tt', as the corresponding swap occurred in a more significant 
(higher ranked) position in the list. 

'Note that one may argue that people are equally drawn to explore the highest and lowest ranked items in a list. For example, if about a hundred cities 
were ranked, it would be reasonable to assume that readers would be more interested in knowing the best and worst ten cities, rather than the cities occupying 
positions 41 to 60. These positional differences may also be addressed within the framework proposed in the paper. 
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Figure 1 : Clickthrough rates (CTRs) of webpages appearing on the first page of Google search. 



The second example corresponds to the well-studied notion of Clickthrough rates (CTRs) of webpages in search engine 
results pages (SERPs). The CTR is used to assess the popularity of a webpage or the success rate of an online ad. It may be 
roughly defined as the number of times a link is clicked on divided by the total number of times that it appeared. A recent study 
by Optify Inc. Il22l showed that the difference between the average CTR of the first (highest-ranked) result and the average 
CTR of the second (runner-up) result is very large, and much larger than the corresponding difference between the average 
CTRs of the lower ranked items (See Figure [TJ. Hence, in terms of directing search engine traffic, swapping higher-ranked 
adjacent pairs of search results has a larger effect on the performance of Internet services than swapping lower-ranked search 
results. 

The aforementioned findings should be considered when forming an aggregate ranking of webpages. For example, in studies 
of CTRs, one is often faced with questions regarding traffic flow from search engines to webpages. One may think of a set of 
keywords, each producing a different ranking of possible webpages, with the aggregate representing the median ranking based 
on different sets of keywords. Based on Figure Q] if the ranking of a webpage is in the bottom half, its exact position is not 
as relevant as when it is ranked in the top half. Furthermore, a webpage appearing roughly half of the time at the top and 
roughly half of the time at the bottom will generate more incoming traffic than a webpage with persistent average ranking. 

Throughout the paper, we refer to the above-mentioned problem as the "top-vs-bottom" problem. Besides the importance in 
emphasizing the relevance of the top of the list, distance measures that penalize perturbations at the top of the list more than 
perturbations at the bottom of the list have another important application in practice - to eliminate negative outliers. As will 
be shown in subsequent sections, top-vs-bottom distance measures allow candidates to be highly ranked in the aggregate even 
though they have a certain (small) number of highly negative ratings. The policy of eliminating outliers before rating items or 
individuals is a well-known one, but has not been considered in the social choice literature in the context of distance-based 
rank aggregation. 

Motivation - Similarity of Candidates 

In many vote aggregation problems, the identity of the candidates may not be known. On the other hand, many other 
applications require that the identity of the candidates be revealed. In this case, candidates are frequently partitioned in terms 
of some similarity criteria - for example, area of expertise, gender, working hour schedule etc. Hence, pairs of candidates may 
have different degrees of similarity and swapping candidates that are similar should be penalized less than swapping candidates 
that are not similar according to the given ranking criteria. For example, in a faculty search ranking one may want to have 
at least one but not more than two physicists ranked among the top 10 candidates, or at least two women among the top 5 
candidates. 

Pertaining to the Economist Intelligence Unit ranking, one may also consider the identity of the elements that are swapped, 
and not only their position. In this case, it may be observed that the swap in it" involves cities on two different continents, 
which may shift the general opinion about the cities' countries of origin. On the other hand, the two cities swapped in n' are 
both in Canada, so that the swap is not likely to change the perception of quality of living in that country. This points to the 
need for distance measures that take into account similarities and dissimilarities among candidates. 

Distance measures capable of integrating these criteria directly into the aggregation process are not know in the literature. A 
class of distance measures introduced by the authors in [12], termed weighted transposition distance, is suitable for this task as 
it can take into account similarities and dissimilarities of candidates. The weighted transposition distance can be viewed as a 
generalization of both the previously described Kendall r and the so called Cayley distance between permutations. The Cayley 
distance between two rankings is the smallest number of (not necessarily adjacent) swaps required to transform one ranking 
into the other. For example, the Cayley distance between the permutations (1,2,3,4) and (1,4,3,2) is one, since the former 
can be transformed into the latter by swapping the elements 2 and 4. Note that while the Cayley distance allows for arbitrary 
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swaps, the Kendall r distance allows for swaps of adjacent elements only. It is straightforward to see that every statements 
made about swapping elements i and j may be converted to statements made about swapping elements at positions i and j by 
using the inverse of the ranking/permutation. Similarity of items may be captured by assigning costs or weights to swaps, and 
choosing the transposition weights so that swapping dissimilar items induces a higher weight/distance compared to swapping 
similar items. This approach is the topic of the next two sections. 

To address the top-vs-bottom and similarity issues, we axiomatically describe a class of distance functions by assigning 
different weights to different adjacent and non-adjacent swaps, termed the weighted Kendall and weighted transposition distance 
measures, respectively. Furthermore, we show that the proposed distance functions can be computed in polynomial time in 
some special cases and provide a polynomial-time 2-approximation algorithm for the general case. The results we present also 
pertain to algorithmic aspects of rank aggregation Q, ifTTl . 11241 . Il25ll . 

In this setting, we describe the performance of an algorithm for rank aggregation based on a generalization of Spearman's 
footrule distance and solving a minimum weight matching problem (this algorithm is inspired by a procedure described in |8l ) 
and a combination of the matching algorithm with local descent methods. Furthermore, we describe an algorithm reminiscent 
of PageRank |8l, where the "hyperlink probabilities" are chosen according to swapping likelihoods (weights). 

The remainder of the paper is organized as follows. An overview of relevant concepts, definitions, and terminology is 
presented in Section |TT] Weighted Kendall distance measures, as well their axiomatic definitions, are presented in Section [TTT] 
Section [TV] is devoted to the weighted transposition distance and its computational aspects. Novel rank aggregation algorithms 
for the weighted Kendall and weighted transposition distances are presented in Section [V] 

II. Preliminaries 

Formally, a ranking is a list of candidates arranged in order of preference, with the first candidate being the most preferred 
and the last candidate being the least preferred one. 

Consider the set of all possible rankings of a set of n candidates. Via an arbitrary, but fixed, injective mapping from the set 
of candidates to {1,2, • • • , n} = [n], each ranking may be represented as a permutation. The mapping is often implicit and 
we usually equate rankings of n candidates with permutations in S n , where §„ denotes the symmetric group of order n. This 
is equivalent to assuming that the set of candidates is the set [n]. For notational convenience, we use Greek lower-case letters 
for permutations, and explicitly write permutations as ordered sets a = (c(l), . . . , cr(n)). 

Let e denote the identity permutation (1, 2, • • • , n). For two permutations n, cr 6 S n , the product \i = ixo is defined via the 
identity = tt(<j(i)), i = 1, 2, • • • n. 

Definition 1. A transposition t = (a b), for a,b £ [n] and a ^ b, is a permutation that swaps a and b and keeps all other 
elements of e fixed. That is, 



b, 


i = a, 


a, 


i = b, 


i, 


else 



If \a — b\ = 1, the transposition is referred to as an adjacent transposition. 

Note that for it G §„, tt (a b) is obtained from tt by swapping elements in positions a and b, and (a b) tt is obtained by 
swapping a and b in tt. For example, (3, 1,4,2)(2 3) = (3,4, 1,2) and (2 3) (3, 1,4, 2) = (2, 1,4,3). 
For our future analysis, we define the set 

^(n-.c) ={t= {n,--- ,r\ T \) : 

a = UT\ • • ■ T\ T \,Ti = (di a t + 1) ,i S [|t|]} 

i.e., the set of all ordered sequences of adjacent transpositions that transform tt into a. The fact that A(ir, a) is non-empty, for 
any ir,a £ S n , is obvious. Using A(tt,(t), the Kendall r distance between two permutations tt and cr, denoted by K(ir, a), 
may be written as 

K(n, a) = min \t\. 

t£A(tt,o) 

For a ranking tt 6 S n and a,i) £ [n], tt is said to rank a before b or higher than b if iv~ 1 {a) < 7r _1 (6). We denote this 
relationship as a <^ b. Two rankings tt and a agree on the relative order of a pair {a, b} of elements if both rank a before b 
or both rank b before a. Furthermore, the two rankings tt and a disagree on the relative order of a pair {a, b} if one ranks a 
before b and the other ranks b before a. For example, consider n = (1, 2, 3, 4) and a = (4, 2, 1, 3). We have that 4 < a 1 and 
that 7r and a agree on {2, 3} but disagree on {1, 2}. 

Given a distance function d over the permutations in §„ and a set S = {cri, • ■ ■ ,<7m} of m votes (rankings), the distance- 
based aggregation problem can be stated as follows: find the ranking tt* that minimizes the cumulative distance from E, 
i.e., 

m 

7r* = arg min d(7r, di). (1) 

™ 4=1 
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In words, the goal is to find a ranking tt that represents the median of the set of permutations S. The choice of the distance 
function d is an important aspect of distance-based rank aggregation and the focus of the paper. 

In flTl . Kemeny presented a set of axioms that a distance function for rank aggregation should satisfy and proved that the 
only distance that satisfies the axioms is the Kendall r. A critical concept in Kemeny's axioms is the idea of "betweenness," 
defined below. 

Definition 2. A ranking oj is said to be between two rankings n and a, denoted by ir-uj-a, if for each pair of elements {a, b}, 
oj either agrees with ir or a or both. The rankings 7ri,7T2, • • • , 7r s are said to be on a line, denoted by tx\-tx%— ■ ■ ■- 7r s , if for 
every i,j, and k for which 1 < i < j < k < s, we have TTi-Tij-irk. 

In Kemeny's work, rankings are allowed to have ties. The basis of our subsequent analysis is the same set of axioms, listed 
below. However, our focus is on ranking without ties, in other words, permutations. 

Axioms I 

1) d is a metric. 

2) d is left-invariant. 

3) For any ir, <t, and to, d(ir, a) = d{ir, uj) + d(u>, a) if and only if oj is between ir and a. 

4) The smallest positive distance is one. 

Axiom 2 states that relabeling of objects should not change the distance between permutations. In other words, d(c77r, auj) = 
d(7r, of), for any ir,a, oj £ S n . Axiom 3 may be viewed through a geometric lens: the triangle inequality has to be satisfied 
with equality for all points that lie on a line between ir and a. Axiom 4 is only used for normalization purposes. 

Kemeny's original exposition included a fifth axiom which we state for completeness: If two rankings ir and a agree except 
for a segment of k elements, the position of the segment does not affect the distance between the rankings. Here, a segment 
represents a set of objects that are ranked consecutively - i.e., a substring of the permutation. As an example, this axiom 
implies that 

d((l ) 2,3,4 L 5 1 6),(l,2,3,6,5,4)) = 

d((l,4 1 5 1 6,2,3),(l,6 i M I 2,3)) 

where the segment is underscored by braces. This axiom clearly enforces a property that is not desirable for metrics designed 
to address the top-vs-bottom issue: changing the position of the segment in two permutations does not alter their mutual 
distance. One may hence believe that removing this axiom (as was done in Axioms I) will lead to distance measures capable 
of handling the top-vs-bottom problem. But as we show below, for rankings without ties, omitting this axiom does not change 
the outcome of Kemeny's analysis. In other words, the axiom is redundant. This is a rather surprising fact, and we conjecture 
that the same is true of rankings with ties. 

In the remainder of this section, we demonstrate the redundancy of Kemeny's fifth axiom and use our novel proof method to 
identify how to change the axioms in Axioms I in order to arrive at distance measures that cater to the need of top-vs-bottom 
and similarity problems. For reasons that will become clear in the next section, we refer to distance measures resulting from 
such axioms as weighted distances. 

The main result of this section is Theorem[H] stating that the unique distance satisfying Axioms I is the Kendall r distance. 
The theorem is proved with the help of Lemmas [3] IH E] H] and [7] 

Lemma 3. For any distance measure d that satisfies Axioms I, and for any sequence of permutations 7ri,7T2, • • ■ , 7r s such that 
7Ti-7T2- • • • -7r s , one has 

s-1 

d(7Tl,7T s ) = y^d(7r fc ,7T fc+ l). 

k=l 

Proof: The lemma follows from Axiom 1.3 by induction. ■ 
Lemma 4. For any d that satisfies Axioms I and for i € [n — 1], we have 

d((ti+l),e)=d((12),e). 

Proof: We first show that d ((2 3) , e) = d ((1 2) , e). Repeating the same argument used for proving this special case gives 
d((*t + l),e) = d((*-l*),e) = -.- = d((12),e). 

To show that d ((2 3) , e) = d ((1 2) , e), we evaluate d(7r, e) in two ways, where we choose it = (3, 2, 1, 4, 5, • • • , n). 

On the one hand, note that ir-uj-i]-e, where uj = 7r(l 2) = (2, 3, 1, 4, 5, • • • , n) and r\ = ui(2 3) = (2, 1, 3, 4, 5, • • • , n). As 
a result, 

d(7r, e) = d(7r,w) +d(u;,?7) +d(r/,e) 

= d(w _1 7r, e) + d(?7 _:L w, e) + d(r), e) 

= d((12),e) + d((2 3),e) + d((12),e) (2) 
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where the first equality follows from Lemma[3] while the second is a consequence of the left-invariance property of the distance 
measure. 

On the other hand, note that 7T-a-/3-e, where a — n(2 3) = (3, 1, 2, 4, 5, ■ • • , n) and (3 = ail 2) = (1, 3, 2, 4, 5, • • ■ , n). 
For this case, 

d(7r, e) = d(7r, a) + d(a, f3) + d(/3, e) 

= d(a- 1 7r, e) + d^a, e) + d(/3, e) 

= d((23),e) + d((12),e) + d((23),e). (3) 

Equations © and © imply that d ((2 3) , e) = d ((1 2) , e). ■ 

Lemma 5. For any d that satisfies Axioms I, d(7, e) equals the minimum number of adjacent transpositions required to 
transform 7 into e. 

Proof: Let 

l {k,ct) = {t = (n,-- - ,T| T |) G A(7r,o-) : 

7T— ITTl— TTT\T2— ' • 4 — <t} 

be the subset of A(ir, a) consisting of sequences of transpositions that transform 7r into a by passing through a line. Let s be 
the minimum number of adjacent transpositions that transform 7 into e. Furthermore, let (ti,T2, • • • ,t s ) G ^(7, e) and define 

7» = 7 r i • • • T ij i = 0, • ' • , s, with 70 = 7 and 7 S = e. 
First, we show 70-71-- ■ --7s, that is, 

(ti,t 2 ,--- ,t 3 ) G L(7,e). (4) 

Suppose this were not the case. Then, there exist i < j < k such that 7^,7?, and 7^ are not on a line, and thus, there 
exists a pair {r, s} for which 7, disagrees with both 7^ and 7^. Hence, there exist two transpositions, and t,/, with 
i < i' < j an d J < j' < k that swap r and s. We can in this case remove Ty and Ty from (ri,--- ,r s ) to obtain 
(t 1; --- , 7v_i, Tj/ + i, • • • , r s ) G A(7,e) with length s — 2. This contradicts the optimality of the choice of s. 

Hence, (77, r 2 , • • • , r s ) 6 £(7, e). Then Lemma [3] implies that 

d(7,e) = ^d(r i ,e). (5) 

i=l 

Lemma |4] states that all adjacent transpositions have the same distance from the identity. Since transpositions r,-, 1 < i < s, 
in (0 are adjacent transpositions, d(r,, e) = a for some a > and thus d(7, e) = sa. 

In ©, the minimum positive distance is obtained when s = 1. That is, the minimum positive distance from identity equals a 
and is obtained when 7 is an adjacent transposition. Axiom 1.4 states that the minimum positive distance is 1. By left-invariance, 
this axiom implies that the minimum positive distance of any permutation from the identity is 1. Hence, a = 1 and for any 

7eS n , 

s 

d (7> e) = ^2 dfa, e) = sa = s. 



Lemma 6. For any d f/zaf satisfies Axioms I, and for it, a £ S n , we /zave 

d(7r, er) = min{s : (n, • ■ • , r s ) G A(7r, er)} . 

Proof: We have (-77, ■ • • , r s ) G A(7r, a) if and only if 

(r x ,-- - ,t s ) G A(cr _1 7r, e). 

Left-invariance of d implies that d(7r, er) = d(cr _1 7r, e). Hence, 

d(7r, a) = d(cr _1 7r, e) 

= min {s : (n, • • ■ ,r s ) G A(o~ x -k, e)} 
= min{s : (n, •■• ,r s ) G A(tt, ct)} 

where the second equality follows from Lemma [5] ■ 
For 7r, er G §„, let 

1 ct ) = J'} : * <tt 3, j <a i} 

be the set of pairs {i, j} on which tt and a disagree. The number \I(tt, a)\ is usually referred to as the number of inversions 
between the two permutations. 
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The following lemma show that the Kendall r distance between a permutation 7r and e equals the number of inversions in 
7r. The result of the lemma is known, but a sketch of a proof is provided for completeness. 

Lemma 7. For n,a G S n , 

K(n,a) = |/(7T,a)|. 

Proof: Consider a sequence Ti,-- - , Tfe of adjacent transpositions that transforms n into er, i.e., a = 7tti ■■■Tk, with 
= if(7r, cr). Let TTj — itti ■ ■ ■ Tj. Each Tj decreases the number of inversions by at most one. Hence, 

\Ifa,<r)\ > |J-(7T,-_i,o-)| - 1 

and thus 

0= \I(* k ,<r)\ > \I(w,a)\-k. 

Since k = K(ir, cr), we obtain 

On the other hand, it is easy to see that one can find Ti,i G [k] in such a way that each t% decreases the number of inversions 
by one. For example, Bubble Sort [3] is one such well-known algorithm for accomplishing this task. Hence, 

K(n,a) - |I(7r,a)|. 

■ 

Theorem 8. The unique distance d that satisfies Axioms I is 

K(n,a) = min {s : (n, ■ • • , r s ) G A(ir,a)} . 

Proof: We show below that K satisfies Axiom 1.3, as proving that K satisfies the other axioms is straightforward. 
Uniqueness follows from Lemma [6] 

To show that K satisfies Axiom 1.3, we use Lemma [7] stating that 

K(ir,a) = \I(n,a)\. 

Fix it, a G §n. For any w£§„, it is clear that 

I(ir, a) C I(ir,u) \Jl{w,a), (6) 

Suppose first that u> is not between ir and a. Then there exists a pair {a, b} with a < v b and a < a b but with a > 0J b. Since 

{a, b} ^ I(ir, a) but {a, b) G I(tt, uj) U I(ui, a), we find that 

|I(7r,<7)| < |J(7T,W)U/(W,0-)|, 

and thus 

if(7r,cr) = 

< \I(tt,u) Ul(u,a)\ 

< \I(tt,uj)\ + \I(w,<r)\ 
= K(tt,uj) + K(u,a). 

Hence, if cj is not between ir and a, then 

K(n,a) ^ K(tt,u) + K{u,a). 

Next, suppose u is between ir and cr. This immediately implies that I(tt,lu) C I(tt, a) and I(w,a) C 7(7r, cr). These 
relations, along with © imply that 

J(7r,w)U/(w,(r)=/(7r,(r). (7) 

We claim that I(tt, uj) n /(w, cr) = 0. To see this, observe that if {a, b} G /(7r, w) n /(w, cr), then the relative rankings of a 
and 6 are the same for ir and cr and so, {a, b} I(ir, a). The last statement contradicts (IT) and thus 

/(tt,w) n/(w,cr) = 0. (8) 

From (IT) and (0, we may write 

^(tt.ct) = |J(7r,cr)| 

= |J(7T,W)U/(W,0-)| 

= |J(7r,w)| + |J(o;,(r)| 
= d(7r,w) + d(w,cr), 



x 




and this completes the proof of the fact that K satisfies Axiom 1.3. ■ 
A distance d over §„ is called a graphic distance J6) if there exists a graph G with vertex set §„ such that for n,a £ S n , 
d (tt, a) is equal to the length of the shortest path between tt and a in G. Note that this definition implies that the edge set of 
G is the set 

{(a,/3):a,/3eS„,d(a,/3) = l}. 

The Kendall t distance is a graphic distance. To see the validity of this claim, take the corresponding graph to have vertices 
indexed by permutations, with an edge between each pair of permutations that differ by only one adjacent transposition. 

In the next section, we introduce the weighted Kendall distance which may be viewed as the shortest path between 
permutations over a weighted graph, and show how this distance arises from modifying Kemeny's axioms. 

III. The Weighted Kendall Distance 

The proof of the uniqueness of the Kendall r distance under Axioms I reveals an important insight: the Kendall r distance 
arises due to the fact that adjacent transpositions have uniform costs, which is a consequence of the betweenness property 
described in one of the axioms. If one had a ranking problem in which weights of transpositions either depended on the identity 
of the elements involved in the transposition or their positions, the uniformity assumption would have to be changed. As we 
show below, a way to achieve this goal is to redefine the axioms in terms of the betweenness property. 

Axioms II 

1) d is a pseudo-metric, i.e. a generalized metric in which two distinct points may be at distance zero. 

2) d is left-invariant. 

3) For any tt, a disagreeing on more than one pair of elements, there exists some ui, distinct from tt and a and between 
them, such that d(7T, a) = 6(tt, ui) + d(cj, a). 

Axiom II. 1 allows for the option that some transpositions are not penalized or counted, due to the side constraints of the voting 
process. Intuitively, Axiom II. 3 states that there exists at least one point on some line between tt and a, for which the triangle 
inequality is an equality. In other words, there exists one "shortest line" between two permutations, and not all straight lines 
are required to be of the same length (see Figure 2 for an illustration). 

Lemma 9. For any distance d that satisfies Axioms II, and for distinct tt and a, we have 

s 

d(7r,cr)= min Vdfo.e). 

I— 1 

Proof: The proof follows by induction on K(tt, a), the Kendall r distance between tt and a. 
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First, suppose that K(%, a) — 1, i.e., tt and a disagree on one pair of adjacent elements. Then, we have a = Tr(aa + 1) for 
some a £ [n — 1], For each (n, • • ■ , r s ) £ A(jr, a), there exists an index j such that tj = (a a + 1) and thus 

^d(T l; e) > d(r 3 -,e) = d((aa+ 1) ,e) 

i=i 

implying 

s 

min y^d(r l; e) > d((oa + l),e). (9) 

(Tl,'",T s )eA(7T,<7)t— ' 

On the other hand, since ((a a + 1)) £ A(tt, a), 



From © and (0, 



min y^d(T 4 ,e) < d((aa+ 1) , e). (10) 

(n,-,r,)£j4(7r,<T)f—' 



d(7r, a) = min dfrj, e) = d((aa + 1) , e) 

(Ti,-,T,)eA(7r,<r) r- 

2=1 



where the last equality follows from the left-invariance of d. 

Next, suppose that K(tt, a) > 1, i.e., tt and a disagree on more than one pair of adjacent elements, and that for all fj,, r\ £ S n 
with K(fj,,r)) < K{-k,o), the lemma holds. Then, there exists w, distinct from tt and a and between them, such that 

d(7r, a) = d(7r, uj) + d(w, a), 
K(n,u>) < K(n,a), 
K{u), a) < K(n, a). 

By the induction hypothesis, there exist (v\, ■ ■ ■ , i/jfe) 6 A(7T, w) and (i>k+i, ■ • • G A(ui, a), for some s and fe, such that 

k 

d(7T,w) = ^d(z/,-,e), 
»=i 

8 

d(w,cr) = ^ d(i/i,e), 

i=k+l 

and thus 

s s 

d(-/T,er) = } d(i/i,e) > min }d(n,e), 

where the inequality follows from the fact that (y\,--- ,v s ) £ A(/7r, a). To complete the proof, note that by the triangle 
inequality, 

s 1 

d{ir,a)< min Vd(r l ,e). 

(ti,--- ,T s ,)&A{n,a) f-f 

2 = 1 

■ 

Definition 10. A distance d v is termed a weighted Kendall distance if there exists a nonnegative weight function <p over the 
set of adjacent transpositions such that 

s 

6Jtt,<j)= min ip Ti , 

where <p Ti is the weight assigned to transposition t 4 - by ip. 

The weight of a transform r = (n, • ■ • , r s ) is denoted by wt (r) and is defined as 



wt 



0) = ^ 



=i 



Hence, d ¥> (7r, <r) may be written as 

d v (TT,a) — min wt(r). 

re J 4(7T,o-) 

Note that a weighted Kendall distance is completely determined by its weight function <p. 
Theorem 11. A distance d satisfies Axioms II if and only if it is a weighted Kendall distance. 
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Proof: It follows immediately from Lemma [9] that a distance d satisfying Axioms II is a weighted Kendall distance by 
letting 

ipe = d(0,e) 

for every transposition 6 taken from the set of adjacent transpositions A n in §„. 

The proof of the converse is omitted since it is easy to verify that a weighted Kendall distance satisfies Axioms II. ■ 
The weighted Kendall distance provides a natural solution for the top-vs-bottom issue. For instance, recall the example of 

ranking cities to live in, with 

7r = (Melbourne, Vienna, Vancouver, Toronto, Calgary, 

Adelaide, Sydney, Helsinki, Perth, Auckland), 
tt' = (Melbourne, Vienna, Vancouver, Calgary, Toronto, 

Adelaide, Sydney, Helsinki, Perth, Auckland), 
tt" = (Vienna, Melbourne, Vancouver, Toronto, Calgary, 
Adelaide, Sydney, Helsinki, Perth, Auckland), 

and choose the weight function cp(n + i) — 0.9 J_1 for i = 1,2,- ■■ ,9. Then, d v (7r,7r') = 0.9 4 = 0.66 < d v (7r,7r") = 1 as 
expected. In this case, we have chosen the weight function to be exponentially decreasing - the choice of the weight function 
in general depends on the application. 



Computing the Weighted Kendall Distance for Monotonic Weight Functions 

Computing the weighted Kendall distance between two permutations for an arbitrary weight function is not as straightforward 
a task as computing the Kendall r distance. However, in what follows, we show that for an important class of weight functions 
- termed "monotonic" weight functions - the weighted Kendall distance may be computed efficiently. 

Definition 12. A weight function <p : A„ — > K + , where A„ as before denotes the set of adjacent transpositions in §„, is 
decreasing if i > j implies that <^( ii+1 ) < tp^ Increasing weight functions are defined similarly. A weight function is 
monotonic if it is increasing or decreasing. 

Monotonic weight functions are of importance in the top-vs-bottom model as they can be used to emphasize the significance 
of the top of the ranking by assigning higher weights to transpositions at the top of the list. An example of a decreasing weight 
function is the exponential weight described in the previous subsection. 

Suppose that r = m,--- ,ti t i) of length |r| transforms tt into a. The transformation may be viewed as a sequence of 
moves of elements i, i = 1, . . . , n, from position 7r _1 (i) to position cr _1 (i). Let the walk followed by element i while moved 
by the transform r be denoted by p hT — (pi T , ■ ■ ■ ' P]^*.^ ' wnere \p hT \ is me length of the walk p l ' T . 



For example, consider 



(3,2,4,1), 

(1,2,3,4), 

(ti,t 2 ,t 3 ,t 4 ) 

((3 4), (2 3), (12), (2 3)) 



and note that a = t^txTit^t^. We have 





= (4,3,2,1), 




= (2,3,2), 




= (1,2,3), 


4,T 

V ' 


= (3,4). 



We first bound the lengths of the walks p 4,T , i g [n]. Let a) be the set consisting of elements j 6 [n] such that tt and 
a disagree on the pair {i,j}. In the transform r, all elements of I^tt, a) must be swapped with i by some r^, k £ [|t|]. Each 
such swap contributes length one to the total length of the walk p l ' T and thus, \p l,T \ > |ii(7r, a)\. 

As before, let d v denote the weighted Kendall distance with weight function tp. Since for any r £ A(w, a), 



M n 1 \P I 

i—l 2 — 1 7 — 1 
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Algorithm 1 FindTauMonotone 
Input: tt, a E §„ 

Output: t* = argmin TeA(7ri(7 ) wt (r) 

1: 7To <— 7T 

2: t <- 

3: for r = a(l),a(2), • • • ,<j(n) do 
4: while 7r t ~ 1 (?') > c _1 (r) do 
5: r t * +1 <- (nt\r) - 1 TrfV)) 
6: 7T t+ i ■(— 7T t T t * +1 

7: i <r- t + 1 

8: end while 
9: end for 



we have 



Thus, 



n \p'\ 

d v (K,a) >£- mix i I^ pj+1 ), (ID 

where for each i, Pi(ir,a) denotes the set of walks of length \Ii{ir, a)\, starting from tt - 1 (z) and ending in ct^" 1 (j). For 
convenience, let 

M 

^(7r,a)=arg ps mm^g^ p j +i) 

be the minimum weight walk from 7r _1 (i) to cr _1 (i) with length \Ii{ir, a)\. 
If clear from the context, we write p 1 '*(tt, a) as p 1 '* . 

We show next that for decreasing weight functions, the bound given in ( fTTT i is achievable and thus the value on the right- 
hand-side gives the weighted Kendall distance for this class of weight functions. 

Consider tt, a € § n and a decreasing weight function ip. For each i, it follows that p l, *(7r, ct) extends to positions with 
largest possible indices, i.e., p 4 '* = (7r _1 (i), • • • ,^ — — 1, • • • where £j is the solution to the equation 

and thus t { = (tt^ 1 ^) + + I^tt, a)) /2. 

We show next that there exists a transform r* with = p 4 '*, and so equality in (fTTb can be achieved. The transform is 
described in Algorithm [T] The transform in question, r*, converts tt into <r in n rounds. In Algorithm Q] the variable r takes 
values <t(1), <t(2), ■ ■ • , cr(n), in that given order. For each value of r, t* moves r through a sequence of adjacent transpositions 
from its current position in m, TT^(r), to position cr _1 (r). 

Fix i £ [n]. For values of r, used in Algorithm 1, such that <r _1 (r) < i is swapped with r via an adjacent transposition 

if 7r _1 (r) > 7r _1 (i). For r = i, i is swapped with all elements fc such that 7r _1 (fc) < 7r _1 (i) and cr _1 (i) < cr _1 (fc). For r such 
that <r _1 (r) > <7 _1 (i), i is not swapped with other elements. Hence, i is swapped precisely with elements of the set /j(7r,<r) 
and thus, \p l - J (7r, er)| = |7j(7r,<j)|. Furthermore, it can be seen that, for each i, p l - T (tt,<j) = (7r _1 (i), • • • ,l\ — 1,1^,1^ — 
1, • • • ,<7 -1 (i)), for some ^. Since |p 4,r {tt, <j)| = |Jf(7r, ct)|, ^ also satisfies the equation 

^- 7r -i(i)+4-a- 1 (i) = 7,(7r,<7), 

implying that ^ = ^ and thus p % - T — p % '*. Consequently, one has the following result. 

Proposition 13. For rankings tt,o 6 S n , and a decreasing weighted Kendall weight function ip, we have 

-l ii-l 



»=i Vj^Tr-HO j^o-HO 



w/iere 4 = (ti-- 1 ^) + ^(i) + a)) /2. 

Increasing weight functions may be analyzed similarly. 
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Example 14. Consider the rankings tt = 4312 and e = 1234 and a decreasing weight function <p. We have Ii(ir, e) — 2 for 
i = 1,2 and Ii(ir, e) = 3 for i = 3, 4. Furthermore, 

4 



3- 


hH 


-2 


4- 


2 

1- 2 H 


-2 


2- 


2 

h 3 H 


-3 


1 - 


2 

Mh 


-3 




2 





3, 


= (3, 2,1), 


4, 


P 2 '* = (4,3,2), 


4, 


p 3 >* = (2,3,4,3), 


4. 


p 4 '* = (1,2,3,4). 



The minimum weight transformation is 

Ci2), (21), (43), (32), (4 3) 

1 2 3 

where the numbers under the braces denote the value r corresponding to the indicated transpositions. The distance between tt 
and e is 

d v (lT,e) = y>(12) + 2^(23) + 2V?(34)- 

Example 15. The bound given in (fTTb is not tight for general weight functions as seen in this example. Consider tt = (4, 2, 3, 1), 

a = (1,2,3,4), and a weight function ip with (/?(i2) = 2,<y9( 2 3) — 1, an d ^(34) = 2. Note that the domain of (p is the set of 
adjacent transpositions. We have 

p 1 >* = (4, 3, 2,1), 
P 2 '* = (2,3,2), 
P 3 '* = (3,2,3), 
p 4 '* = (1,2,3,4). 

Suppose that a transform r exists such that p 1 '* = p l ' T , i = 1, 2, 3, 4. From p 1 '*, it follows that in r, transpositions (12) and (3 4) 
each appear once and (2 3) appears twice. It can be shown, by considering all possible re-orderings of {(1 2), (1 2), (2 3), (2 3), (2 3)} 
or by an application of [12 Lemma 5] that r does not transform tt into a. Hence, for this example, the lower bound (fT~TT > is 
not achievable. 



Weight Functions with Two Identical Non-zero Weights 

Another example of a weighted Kendall r distance for which a closed form solution may be found is described below. 
For a pair of integers a, b, 1 < a < b < n, define the weight function as: 

Jl, te{a,b} 

^ +1) = \0, else, (12) 

i.e., a function which only penalizes moves involving candidates in positions a and b. 

Such weight functions may be used in voting problems where one only penalizes moving a link from one page (say, top-ten 
page) to another page (say, ten-to-twenty page). In other words, one only penalizes moving an item from a "high-ranked" set 
of positions to "average-rank" or "low-rank" positions. 

An algorithm for computing the weighted Kendall distance for this case is given in the Appendix. 



Approximating the Weighted Kendall Distance for General Weight Functions 

The result of the previous subsection implies that at least for one class of weight functions that capture the importance of the 
top entries in a ranking, computing the weighted Kendall distance has time complexity 0{n 2 ). Hence, distance computation 
efficiency does not represent a bottleneck for the employment of this form of the weighted Kendall distance. 

In what follows, we present a polynomial-time 2-approximation algorithm for computing the most general form of weighted 
Kendall distances, as well as two algorithms for computing this distance exactly. While the exact computation has super 
exponential time complexity, for a small number of candidates - say, less than 10 - the computation can be performed in 
reasonable time. A small number of candidates and a large number of voters are frequently encountered in social choice 
applications, but less frequently in computer science. 

In order to approximate the weighted Kendall distance, d v (7r, a), we use the function D v (n,a), defined as 

n 

»=i 
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where 

{Ylh2k f(hh+i), iffc</, 

w( k = < TX=) f(hh+i), if ft > J, 
[O, ifk = l, 

denotes the sum of the weights of adjacent transpositions (ft k + 1), (ft + 1 ft + 2), • • ■ , (Z — 1 1) if k < I, the sum of the weights 
of adjacent transpositions (I I + 1), (I + 1 1 + 2), • • • , (ft — 1 ft) if I < ft, and if k = I, 

The following proposition states lower and upper bounds for d^, in terms of D^. The propositions is useful in practice, since 
D v can be computed in time 0(n 2 ), and provides the desired 2-approximation. 

Proposition 16. For a weighted Kendall weight function ip and for permutations tt and a, 

^p v (jr,o) < d ¥ ,(7r, a) < D^tt, cr). 

We omit the proof of the proposition, since it follows from a more general result stated in the next section, and only remark 
that the lower-bound presented above proposition is weaker than the lower-bound given by ( fTTT ). 

Next, we discuss computing the exact weighted Kendall distance via algorithms for finding minimum weight paths in graphs. 
As already pointed out, the Kendall r and the weighted Kendall distance are graphic distances. In the latter case, we define a 
graph G with vertex set indexed by §„ and an edge of weight tp^ j+i), i S [n — 1], between each pair of vertices tt and a for 
which there exists an i such that tt = a(i i + 1). The numbers of vertices and edges of G are \V\ = n\ and \E\ = n\{n — l)/2, 
respectively. Dijkstra's algorithm with Fibonacci heaps Ifl4l for finding the minimum weight path in a graph provides the 
distances of all tt 6 §„ to the identity in time 0(\E\ + \V\ log |V|) = 0(n\ nlogn). 

The complexity of the algorithm for finding the distance between tt G §„ and the identity may be actually shown to be 
0(n(K(ir, e))!), which is significantly smaller than for permutations at small Kendall r distance. The minimum weight 

path algorithm is based on the following observation. For it in §„, there exists a transform r = (ri, T2, • • • , r m ) of minimum 
weight that transforms tt into e, such that m = K(tt, e). In other words, each transposition of r eliminates one inversion when 
transforming tt into e. Hence, ixt\ has one less inversion than tt. As a result, 

d lp (w,e)= min (ipu + d v (7r(i i + 1), e)) (13) 

i:7r(j)>7r(«+l) 

Note that the minimum is taken over all positions i for which i and i + 1 form an inversion, i.e., for which n(i) > n(i + 1). 
Suppose that computing the weighted Kendall distance between the identity and a permutation tt, with K(ir, e) — d, can be 
performed in time T^. From ( fT3] l, we have 

T d = an + dT d _ 1 , for d > 2, 

and T\ = an, for some constant a. By letting Ud = T d /{and\), we obtain Ud = Ud-\ + 4r,<i > 2, and U\ = 1. Hence, 
U d = J2i=i j\ - 11 can then be shown that d\ U d = [dl(e - 1)J, and thus T d = an[d\ (e - 1)J = 0(nd\). 

The expression (TT3T > can also be used to find the distances of all tt e §„ from the identity by first finding the distances 
of permutations tt 6 §„ with K(tt, e) = 1, then finding the distances of permutations tt 6 §„ with K(tt, e) = 2, and so 
oro. Unfortunately, the average Kendall r distance between a randomly chosen permutation and the identity is (™) /2 (see the 
derivation of this known and a related novel result regarding the weighted Kendall distance in the Appendix), which limits the 
applicability of this algorithm to uniformly and at random chosen votes limited. 

Aggregation with Weighted Kendall Distances: Examples 

In order to explain the potential of the weighted Kendall distance in addressing the top-vs-bottom aggregation issue, in what 
follows, we present a number of examples that illustrate how the choice of the weight function influences the final form of 
the aggregate. We focus on decreasing weight functions and compare our results to those obtained using the classical Kendall 
t distance. 

Throughout the remainder of the paper, we refer to a solution of the aggregation problem using the Kendall r as a Kemeny 
aggregate. All the aggregation results are obtained via exhaustive search since the examples are small and only used for 
illustrative purposes. Aggregation is, in general, a hard problem and we postpone the analysis of the complexity of computing 
aggregate rankings, and aggregate approximation algorithms, until the next section. 

Example 17. Consider the set of rankings listed in S, where each row represents a ranking (vote), 

/ 4 1 2 5 3 \ 



4 


2 


1 


3 


5 


1 


4 


5 


2 


3 


2 


3 


1 


5 


4 



V 5 3 1 2 4 / 



2 Note that such an algorithm requires that the set of permutations at a given Kendall r distance from the identity be known. 
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The Kemeny optimal solution for this set of rankings is (1,4,2,5,3). Note that despite the fact that candidate 4 was ranked 
twice at the top of the list - more than any other candidate - it is ranked only second in the aggregate. This may be attributed 
to the fact that 4 was ranked last by two voters. 

Consider next the weight function ip( 2 / 3 ' ) with W^-n — (2/3) t_1 ,i G [4], The optimum aggregate ranking for this weight 
equals (4, 1, 2, 5, 3). The optimum aggregate based on ip( 2 / 3 ) puts 4 before 1, similar to what a plurality vote would dc0. The 
reason behind this swap is that (p^ 2 /^ emphasizes strong showings of a candidate and downplays its weak showings, since 
weak showings have a smaller effect on the distance as the weight function is decreasing. In other words, higher ranks are 
more important than lower ranks when determining the position of a candidate. 

Example 18. Consider the set of rankings listed in E, 

/ 1 4 2 3 \ 



1 


4 


3 


2 


2 


3 


1 


4 


4 


2 


3 


1 



\ 3 2 4 1 / 



The Kemeny optimal solution is (4, 2, 3, 1). Note that although the majority of voters prefer 1 to 4, 1 is ranked last and 4 
is ranked first. More precisely, we observe that according to the pairwise majority test, 1 beats 4 but loses to 2 and 3. On 
the other hand, 4 is preferred to both 2 and 3 but, as mentioned before, loses to 1. Problems like this do not arise due to 
a weakness of Kemeny's approach, but due to the inherent "rational intractability" of rank aggregation. As stated by Arrow 
HI, for any "reasonable" rank aggregation method, there exists a set of votes such that the aggregated ranking prefers one 
candidate to another while the majority of voters prefer the later to the former. 

Let us now focus on a weighted Kendall distance with weight function <pui+x) = (2/3) J_1 , i = 1, 2, 3. The optimal aggregate 
ranking for this distance equals (1, 4, 2, 3). Again, we see a candidate with both strong showings and weak showings, candidate 
1, beat a candidate with a rather average performance. Note that in this solution as well, there exist candidates for which the 
opinion of the majority is ignored: 1 is placed before 2 and 3, while according to the pairwise majority opinion it loses to 
both. 

Example 19. Consider the set of rankings listed in E, 

/ 5 4 1 3 2 \ 



1 


5 


4 


2 


3 


4 


3 


5 


1 


2 


1 


3 


4 


5 


2 


4 


2 


5 


3 


1 


1 


2 


5 


3 


4 



V 2 4 3 5 1 / 



With the weight function ip>n + u = (2/3) i ,i € [4], the aggregate equals (4, 1,5,2,3). The winner is 4, while the plurality 
rule winner is 1 as it appears three times on the top. Next, we increase the rate of decay of the weight function and let 
fia+i) — (1/3) J_1 ,« G [4]. The solution now is (1,4,2,5,3), and the winner is candidate 1, the same as the plurality rule 
winner. This result is a consequence of the fact that the plurality winner is the aggregate based on the weighted Kendall 
distance with weight function tp^ p \ 

( P ) fl, i = l, 
]0, else. 

The Kemeny aggregate is (4, 5, 1, 2, 3). 

A shortcoming of distance-based rank aggregation is that sometimes the solution is not unique, and that the possible solutions 
differ widely. The following example describes one such scenario. 

Example 20. Suppose that the votes are given by E, 



/ 1 2 3 \ 



1 


2 


3 


3 


2 


1 



V 2 1 3 J 

Here, the permutations (1, 2, 3), (2, 1, 3) are the Kemeny optimal solutions, with cumulative distance 4 from E. When the 
Kemeny optimal solution is not unique, it may be possible to obtain a unique solution by using a non-uniform weight function. 

3 In plurality votes, the candidate with the most first-place rankings is declared the winner. 
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In this example, it can be shown that for any non-uniform weight function ip with </5(i2) > V(2 3)> me solution is unique, 
namely, (1, 2, 3). 

A similar situation occurs if the last vote is changed to (2, 3, 1). In that case, the permutations (1, 2, 3), (2, 1, 3), and (2, 3, 1) 
are the Kemeny optimal solutions with cumulative distance 5 from S. Again, for any non-uniform weight function ip with 
V(12) > V(23) the solution is unique and equal to (1,2,3). 

To summarize, the above examples illustrate how a proper choice for the weighted Kendall distance insures that top ranks 
are emphasized and how one may over-rule a moderate number of low rankings using a specialized distance formula. One 
may argue that certain generalizations of Borda's method, involving non-uniform gaps between ranking scores, may achieve 
similar goals. This is not the case, as will be illustrated in what follows. 

One major difference between generalized Borda and weighted Kendall distances is in the already mentioned majority 
criteria ifTSl . which states that the candidate ranked first by the majority of voters has to be ranked first in the aggregate! 
Borda's aggregate with an arbitrary score assignments does not have this property, while aggregates obtained via weighted 
Kendall distances with decreasing weights (not identically equal to zero) have this property. 

We first show that the Borda method with a fixed, but otherwise arbitrary set of scores may not satisfy the majority criterion. 
We prove this claim for n = 3. A similar argument can be used to establish this claim for n > 3. 

Suppose, for simplicity, that the number m of voters is odd and that, for each vote, a score Si is assigned to a candidate 
with rank i, i = 1,2,3. Here, we assume that s x > s 2 > s 3 > 0. Suppose also that (m + l)/2 of the votes equal (a,b,c) 
and that (to — l)/2 of the votes equal (b,c,a). Let the total Borda scores for candidates a and b be denoted by S and S', 
respectively. We have 

m + 1 to — 1 
S=—s 1 + — S3 , 

m + 1 to — 1 



S' 



2 



-S2 H 7\ Si, 



and thus S-S' = s 1 -m i^ 3 -) - ^±i&. If to > 2si iS2±£sl tne n S - S' < and Borda's method ranks b higher than a. 

V Z / A S2— S3 ° 

As a result, candidates a, ranked highest by more than half of the voters, is not ranked first according to Borda's rule. This is 
not the case with weighted Kendall distances, as shown below. 

Proposition 21. An aggregate ranking obtained using the weighted Kendall distance with a decreasing weight function not 
identically equal to zero satisfies the majority criterion. 

Proof: Suppose that the weight function is cp, and let Wi = <pu Since w is decreasing and not identically equal to 
zero, we have w\ > 0. Let a\ be a candidate that is ranked first by a majority of voters. Partition the set of votes into two 
sets, C and D, where C is the set of votes that rank a\ first and D is the set of votes that do not. Furthermore, denote the 
aggregate ranking by tt. 

Suppose that a\ is not ranked first in tt and that tt is of the form 

(a2,--- , Cbi, CLl, fflt+1, • • • ,a„), 
for some i > 2. Let tt' = (a%, a,2, • • • , a n ). We show that 

m rn 

i=i 3=1 
which contradicts the optimality of tt. Hence, a\ must be ranked first in tt. 
For a G C, we have 

d v (7r,o-) = d ¥ ,(7r,7r') +d ¥ ,(7r',cr). (14) 

To see the validity of this claim, note that if tt is to be transformed to a via Algorithm [T] it is first transformed to tt' by 
moving a\ to the first position. 
For a G D, we have 

d(p(ir',(r) < d ¥ ,(7r',7r) +d v (TT,a), (15) 

which follows from the triangle inequality. 



4 Note that a candidate ranked first by the majority is a Condorcet candidate. It is desirable that an aggregation rule satisfy the majority criterion and indeed 
most do, including the Condorcet method, the plurality rule, the single transferable vote method, and the Coombs method. 
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To complete the proof, we write 



j=i o-ec aeD 

> ]Td») + |C7|d v (7r,V) 



+ ^d v ( 7 r',a)-|£>|d ¥! (7r,7r') 



<t6C 
m 

= ^d v ( 7 r',a) + (|C|-| J D|)d v ( 7 r,y) 

m 

i=l 

where the first inequality follows from (fT4] i and (1151 1. and the second inequality follows from the facts that |C| > \D\ and that 

d v (7T, 7r') > Wl > 0. 

■ 

IV. Weighted Transposition Distance 

The definition of the Kendall r distance and the weighted Kendall distance is based on transforming one permutation into 
another using adjacent transpositions. If, instead, all transpositions are allowed - including non-adjacent transpositions - a more 
general distance measure, termed weighted transposition distance is obtained. This distance measure, as will be demonstrated 
below, represents a generalization of the weighted Kendall distance suitable for addressing similarity issues among candidates. 
It is worth pointing out that the weighted transposition distance is not based on the axiomatic approach described in the 
previous section. 

Definition 22. Consider a function ip that assigns to each transpositions 6, a non-negative weight ipg. The weight of a sequence 
of transpositions is defined as the sum of the weights of its transpositions. That is, the weight of the sequence r = (ti, • • • , T| T |) 
of transpositions equals 

M 

wt ( r ) = X^ r *- 

2=1 

For simplicity, we also denote the weighted transposition distance between two permutations ir, a S S n , with weight function ip, 
by d v . This distance equals the minimum weight of a sequence t — (ti, • ■ • , ri T i) of transpositions such that a = ttti ■ ■ ■ ti t i. 
As before, we refer to such a sequence of transpositions as a transform converting ir into a and let At(7t,ct) denote the set 
of transforms that convert ix into a. 

With this notation at hand, the weighted transposition distance between it and a may be written as 

d ¥ ,(7r, a)— min wt(r). 

The Kendall r distance and the weighted Kendall distance may be viewed as special cases of the weighted transposition 
distance: to obtain the Kendall r distance, let 

jl, 9 = (ii + = 1,2, • • • ,n - 1 

6 I oo, else, 



and to obtain the weighted Kendall distance, let 



Wi, 6 = (ii + = 1,2, • • • ,n — 1 

oo. else, 



for a non-negative weight function w. 

When applied to the inverse of rankings, the weighted transposition distance can be successfully used to model similarities of 
objects in rankings. In such a setting, permutations that differ by a transposition of two similar items are at a smaller distance 
than permutations that differ by a transposition of two dissimilar items, as demonstrated in the next subsection. 
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A. Weighted Transposition Distance as Similarity Distance: Examples 

We illustrate the concept of distance measures taking into account similarities via the following example, already mentioned 
in the Motivation section. Suppose that four cities: Melbourne, Sydney, Helsinki, and Vienna are ranked based on certain 
criteria as 

7r = (Helsinki, Sydney, Vienna, Melbourne), 
and according to another set of criteria as 

a = (Melbourne, Vienna, Helsinki, Sydney). 

The distance between ir and a is defined as follows. We assign weights to swapping the positions of cities in the rankings, 
e.g., suppose that the weight of swapping cities in the same country is 1, on the same continent 2, and 3 otherwise. The similarity 
distance between tt and a is the minimum weight of a sequence of transpositions that convert 7r into a. By inspection, or 
by methods discussed in the next subsection, one can see that the similarity distance between ir and a equals 6. One of the 
sequences of transpositions of weight 6 is as follows: first swap Helsinki and Sydney with weight 3, then swap Melbourne 
and Sydney with weight 1, and finally swap Vienna and Helsinki with weight 2. 

To express the similarity distance formally, we write the rankings as permutations, representing Melbourne by 1, Sydney by 
2, Vienna by 3, and Helsinki by 4. This is equivalent to assuming that the identity ranking is 

e = (Melbourne, Sydney, Vienna, Helsinki). 

We then have tt = (4, 2, 3, 1) and a = (1, 4, 2, 3). The weight function ip equals 

P(12) = l, P(13)=3, <P(14)=3 
V(2 3)=3, V(2 4)=3, <f(3 4) = 2. 

It should be clear from the context that the indices in the weight function refer to the candidates, and not to their positions. 
Example 23. Consider the votes listed in S below, 

12 3 4 



3 2 14 

4 13 2 

Suppose that even numbers and odd numbers represent different types of candidates in a way that the following weight function 
is appropriate 

1, if i, j are both odd or both even, 

2, else. 



Note that the votes are "diverse" in the sense that they alternate between odd and even numbers. On the other hand, the 
Kemeny aggregate is (1,3,2,4), which puts all odd numbers ahead of all even numbers. Aggregation using the similarity 
distance described above yields (1,2,3,4), a solution which may be considered "diverse" since the even and odd numbers 
alternate in the solution. The reason behind this result is that the Kemeny optimal solution is oblivious to the identity of the 
candidates and their (dis)similarities, while aggregation based on similarity distances take such information into account. 

Example 24. Consider the votes listed in £ below, 





/ 1 


2 


3 


4 


5 


6 




1 


2 


3 


4 


5 


6 


£ = 


3 


6 


5 


2 


1 


4 




3 


6 


5 


2 


1 


4 




V 5 


4 


1 


6 


3 


2. 



Suppose that the weight function is the same as the one used in the previous example. In this case, neither the Kemeny 
aggregates nor the weighted transposition distance aggregates are unique. More precisely, Kendall r gives four solutions: 

3 5 1 6 2 4 \ 



3 


5 


1 


2 


4 


6 


I— 1 


3 


5 


2 


4 


6. 



\ 1 3 5 6 2 4. / 



18 



while there exist nine optimal aggregates under the weighted transposition distance of the previous example, of total distance 
10: 



5 


6 


3 


4 


1 


2 


5 


4 


3 


2 


1 


6 


5 


2 


3 


6 


1 


4 


3 


4 


1 


2 


5 


6 


3 


6 


1 


4 


5 


2 


3 


2 


1 


6 


5 


4 


1 


4 


5 


2 


3 


6 


1 


2 


5 


6 


3 


4 


1 


6 


5 


4 


3 


2. 



Note that none of the Kemeny optimal aggregates have good diversity properties: the top half of the rankings consists 
exclusively of odd numbers. On the other hand, the optimal weighted transposition rankings all contain exactly one even 
element among the top-three candidates. Such diversity properties are hard to prove theoretically. 

B. Computing the Weighted Transposition Distance 

In this subsection, we describe how to compute or approximate the weighted transposition distance d v , given the weight 
function (p. An in-depth analysis of a special class of weight functions and their corresponding transposition distance may be 
found in the authors' recent work fl2l . 

We find the following definitions useful in our subsequent derivations. For a given weight function ip, we let JC V denote 
a complete undirected weighted graph with vertex set [n], where the weight of each edge equals the weight of the 

transposition (ij), For a subgraph H of /C^, with edge set Eh, we define the weight of H as 

wt(-fT) = 

{i,j)eE H 

that is, the sum of the weights of edges of H. For n, a G S„, we define D v (ir, a) a^| 

TO 

£> y ( 7 r,a) = ^wt(p;(7r- 1 (i) )( 7- 1 (0)), 

i=l 

where (a, b) denotes the minimum weight path from a to & in JC V . 

It is easy to verify that D v is a pseudo-metric and that it is left-invariant, 

D v (r]TT,r]a) — D v {it,a), 7r,a,rjeS n . 
A weight function ip is a metric weight function if it satisfies the triangle inequality in the sense that 

<P(ab) < <P(ac) + Vibe), a,b,c€[n]. (16) 
Lemma 25. For a non-negative weight function ip and a transposition (a b) E S n , 

d ¥> ((o6),e)<2wt( P ;(a,6)). 
If ip is a metric weight function, the bound may be improved to 

d v ((ab),e) < wt(p*(a,&)). 
Proof: Consider a path p = (vo = a, V\, ■ ■ ■ , v\ p \ = b) from a to 6 in K, v . We have 

(a 6) = (« vi) (vi u 2 ) • • • (w| p |-2 u |p|-i) 

(v\p\-l «|p|) (v\p\-2 V\p\-l) •••(«! V 2 ) (v Vi) . 



From the left-invariance of d u 



d v ((ab),e) = 2 ^ 



<P(v 



\ P \-i «| P | 



= 2wt(p)-^ ( „ |p| _ lt , w ) 
< 2wt(p). 



Note that this definition is consistent with the definition of a specialization of this function, given in Proposition 16. 
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Since p is an arbitrary path from a to b in K v , we have 

d ¥> ((o6),e)<wt(r)<2wt( P ;(a > 6)), 

and this proves the first claim. 

Now, assume that ip is a metric weight function and consider the path p = (vq, Vi, ■ ■ ■ , v\ p \) from vo = a to v\ p \ = b. From 

©, 

f(ab) = f(v v lpl ) < <P(voV!) +<P(viv M ) 

< <P(.V Vl) + V(v 1 V 2 ) + <P(V2V M ) 

< ... 

H-l 

8=1 

= wt(p). 

Since p is arbitrary, we have 

d(p((a&),e) < y>(o6) < wt(p*(a,&)). 

This completes the proof of the lemma. ■ 
While Lemma [25] suffices to prove all our subsequent results, we remark that one may prove a slightly stronger result, 
presented in our companion paper fl2l . 

d v ((ab),e)= min I 2wt(p) — max tp/ v . v ) I . 

p=(vo=a,vi,— ,v\ p \=b) \ 0<i<|p| / 

The proof is based on significantly more involved techniques that are beyond the scope of this paper. 
Lemma 26. For a weight function <p> and for ir,a £ S„, 

d v (n,a) < 2D V (-K,a). 
If ip is a metric weight function, the bound may be improved to 

d lp (w,a) < D v (ir,a). 

Proof: To prove the first claim, it suffices to show that d ¥ ,(7r, e) < 2D v (n : e) since both d v and D v are left-invariant. 
Let {ci, C2, • • ■ , Ck} be the cycle decomposition of ir. We have, from the triangle inequality and the left-invariance property 
of dp, that 

k 

d v (TT,e) < ^d v (c I ,e), 

i=l 

and, from the definition of D v , that 

k 

D v (ir,e) =^D ¥ ,(c j ,e). 

i=l 

Hence, we only need to prove that 

d v (c,e) < 2D v (c,e) (17) 

for a single cycle c = (ai a2 • • ■ ai c i), where |c| is the length of c. 
Since c may be written as 

c = (oi a 2 )(a 2 a 3 ) • • ■ (a| c |-l a| c |), 

we have 

|c|-l 

d v (c,e) < ^ <P(a»a i+1 ) 
i=l 

w 

< 2J 2wt (p* (ai,a i+ i)) 

i=l 

c 

<^2wt(p;( ai , c ( ai ))) 

i=l 

< 2L> v (c, e) 
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(a) 




Figure 3: A defining path (a), which may correspond to a metric -path weight function or an extended-path weight function, 
and a defining tree (b), which may correspond to a metric-tree weight function or an extended-tree weight function. 



where (a) follows from Lemma l25l 

The proof of the second claim is similar. 

The next lemma provides a lower bound for d v in terms of D v . 
Lemma 27. For n, a € §„, 

d v (7r,er) > -D v (ir,a). 

Proof: Since d v and D v are both left-invariant, it suffices to show that 

1 

2' 



d v (vr,e) > -D v (n,e). 



Let (rx, ■ • • ,ti) , with Tj = (cij bj), be a minimum weight transform of tt into e, so that d ¥ ,(7r, e) = ^ i=1 <^( ai 6-V Furthermore, 
define 7Tj = 7tti • • • Tj, < j < I. Then, 

Dip (tTj-i, e) - L> v (ttj, e) < 2wt (p* (a^bj)) 

<2^ (oj 6 3 .), (18) 

where the first inequality follows from considering the maximum possible decrease of the value of D v induced by one 
transposition, while the second inequality follows from the definition of p* . By summing up the terms in ( TT8l over < j < I, 
and thus obtaining a telescoping inequality of the form D v (Tr,e) < ^Yli^ifiab) — 2d v (7r, e), we arrive at the desired 
result. ■ 
From the previous two lemmas, we have the following theorem. 

Theorem 28. For n,a 6 S„ and an arbitrary non-negative weight function if, we have 

1 

2 J 

In addition, if Lp is a metric weight function, then 

1 

r 

For special classes of the weight function ip, the bounds in Theorem [28] may be improved further, as described in the next 
subsection. 



-D v (ir,a) < A v {-k,(j) < 2D v (ir,a) 



C. Computing the Transposition Distance for Metric-Tree Weights 
We start with the following definitions. 

Definition 29. A weight function ip is a metric-tree weight function if there exists a weighted tree 9 over the vertex set [n] 
such that for distinct a, b G [n], <fi( a b) is tne sum of the weights of the edges on the unique path from a to b in O. If 8 is a 
path, i.e., if & is a linear graph, then ip is called a metric-path weight function. 
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Figure 4: The cycle (2 6 4 7 5) in Figure (a) is decomposed into two cycles, (2 6 4) and (4 7 5), depicted in Figure (b). Note 
that (2 6 4 7 5) = (2 6 4) (4 7 5). 

Furthermore, a weight function cp' is an extended-tree weight function if there exists a weighted tree 9 over the vertex set [n] 
such that for distinct a, b G [n], ^ ^ equals the the weight of the edge (a, b) whenever a and b are adjacent, and <p'/ ab \ — oo 
otherwise. If Q is a path, then ip' is called an extended-path weight function. 

Note that the Kendall weight function, defined in the previous section, is an extended path weight function. 

The tree or path corresponding to a weight function in the above definitions is termed the defining tree or path of the weight 
function. An example is given in Figure [3] where the numbers indexing the edges denote their weights. 

For a metric-tree weight function cp with defining tree 8, and for a, b G [n], the weight of the path p*(a, b) equals the 
weight of the unique path from a to b in 8. This weight, in turn, equals <P{ab)- As a result, for metric-tree weights, p* (a, b) 
equals the weight of the path from a to b in 8. 

Furthermore, from Lemma l27l we have d v ((a b),e) > ^D v ((a b),e) — wt(p*(a, b)) — f( a b)- Since we also have 
d v ((a b),e) < <P(- a b)> it follows that 

d v {(a b),e) = tp {ab) . (19) 
The next lemma shows that the exact distance for metric-path weight functions can be computed in polynomial time. 
Lemma 30. For a metric-path weight function ip and for n, a G § n , 

Proof: From Lemma l27l we have that d ¥ ,(7r,cr) > ^D v {Tt,cr). It remains to show that d ¥ ,(7r,cr) < ^D v (tt, a). Since d v 
and D v are both left-invariant, it suffices to prove that d v (7r, e) < ^D v (tt, e). 

Let {ci,C2, • • • ,Ck} be the cycle decomposition of tt. Similar to the proof of Lemma [26l it suffices to show that 

Mc,e) < \D v {c,e) (20) 

for any cycle c = (ai tt2 • ■ • a| c |)- 

The proof is by induction. For |c| = 2, j20l ) holds since, from ( fT9] i, we have 

dtp((ai a 2 ),e) = </? (ab) = wt (p* (01,02)) = -^((ai o 2 ),e). 

Assume that ( f20b holds for 2 < |c| < i. We show that it also holds for \c\ = I. We use Figure [4] for illustrative purposes. In 
all figures in this section, undirected edges describe the defining tree, while directed edges describe the cycle at hand. 

Without loss of generality, assume that the defining path of ip, 8, equals (1,2, ••• , n). Furthermore, assume that ai = 
min{i : i G c}; if this were not the case, we could rewrite c by cyclically shifting its elements. Let a t = min{i : i G c, i ^ ai} 
be the "closest" element to ai in 8 (that is, the closest element to ax in the cycle c). For example, in Figure [4] one has 
c = (2 6 4 7 5), ai = 2 and a t = 4. We have 

c = (eti a 2 ■ ■ ■ a t ■ ■ ■ a{) 
= (ai o 2 • ■ ■ a t )(o t o t +i • ■ ■ a/) 
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Figure 5: If each of the cycles of a permutation lie on a path, the method of Lemma [30l can be used to find the weighted 
transposition distance. 



and thus 

d v (c, e) < d v ((ai a 2 ••• a t ), e) + d v ((a t a t+ i ••• a/),e) 



< \ ^2 wt (Pv( a ». a =+i)) + wt (P^(at! a i)) 
1 

+ 2 wt (K=( a *' a '+i)) + wt {P*ip( a U a t)) 

i=t 

1 ' 

= o XI Wt (KS>( a «> c ( a 0)) 



i=i 

1 



d^(c,e) = -£) v (c,e). (21) 



= -^(c,e). 

where the second inequality follows from the induction hypothesis, while the first equality follows from the fact that wt (pZ(ati a i))~ 
wt (p*(aj,at)) =wt (p*(aj,ai)). ■ 

The approach described in the proof of Lemma [30] can also be applied to the problem of finding the weighted transposition 
distance when the weight function is a metric-tree weight function and each of the cycles of the permutation consist of elements 
that lie on some path in the defining tree. An example of such a permutation and such a weight function is shown in Figure [5] 
Note that in this example, a cycle consisting of elements 3, 5, 7 would not correspond to a path. 

In such a case, for each cycle c of -jt we can use the path in the defining tree that contains the elements of c to show that 

1 

2' 

For example the cycle (1 4 6) lies on the path (1, 2, 3, 4, 5, 6) and the cycle (5 8) lies on the path (5, 4, 7, 8). Since (l2Tb holds 
for each cycle c of tt, we have 

d v (vr,e) = -D v (ir,e). 

A similar scenario in which essentially the same argument as that of the proof of Lemma [30] can be used is as follows: 
the defining tree has one vertex with degree three and no vertices with degree larger than three (i.e., a tree with a Y shape), 
and for each cycle of tt, there are two branches of the tree that do not contain two consecutive elements of c. It can then be 
shown that each such cycle can be decomposed into cycles that lie on paths in the defining tree, reducing the problem to the 
previously described one. An example is shown in Figure [6] 

One may argue that the results of Lemma [30] and its extension to metric-trees have limited application, as they require that 
both the defining tree and the permutations/rankings used in the computation be of special form. In particular, one may require 
that a given ranking tt is such that there are no edges between two different branches of 6 in the cycle graph of tt. We show 
next that under certain conditions the probability of such permutations goes to zero as n — \ oo, by lower bounding the number 
P n of permutations with the given constraint. 

Let the set of vertices in the ith branch of a Y shaped defining tree O, i = 1, 2, 3, be denoted by Bi and let bi denote the 
number of vertices in Bi. Clearly, b\ + 62 + &3 + 1 = n. 

Assume, without loss of generality, that the numbering of the branches is such that b% > 62 > &3- As an illustration, in 
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Figure 6: The cycle (1 6 5 2 8 3 7) in Figure (a) is decomposed into two cycles, (1 6 5 2) and (2 8 3 7), as shown in Figure 
(b). Note that (1 6 5 2 8 3 7) = (1 6 5 2)(2 8 3 7). 



Figure 3b we have 

51 ={1,2, 3}, &i = 3, 

5 2 = {5,6}, b 2 = 2, 

5 3 = {7,8}, 6 3 -2. 

The quantity P n is greater than or equal to the number of permutations tt whose cycle decomposition does not contain 
an edge between S2 and S3, and this quantity is, in turn, greater than or equal to the number of permutations tt such that 
7r (j) ^ S2US3 for j e S2US3. The number of permutation with the latter property equals ( b *^ ) (p2 + &3)!(&i + 1)!. Hence, 

p > ((&i + I)!) 2 



and thus 



In particular, if 62 = b-i = 1, we have 



(61 + 1 - 6 2 - 63)! 



yin-b 2 -b 3 
r n llj=n+l-26 2 -2fc 3 J 

n ' Hj=n+l-b 2 -b 3 3 



^ >( n-3)(n-2) =i _4 + 
n! (n — l)n n 



and more generally, if 62 + &3 = o(n), then 

> (n + o(n)) b *+ b 3 ^ 1 
n! ~ (n + o(n)) & 2+&3 ~ ' 

or equivalently, P n ~ n!. 

Hence, if 62 + b-3 — o(n), the distance d v (ir, e) of a randomly chosen permutation 7r from the identity equals D v {ir,e)/2 
with probability approaching 1 as n — » 00. 

It is worth noting that for metric-tree weight functions, the equality of Lemma [30] is not, in general, satisfied. To prove this 
claim, consider the metric-tree weight function ip in Figure [7] where, for a, b € [n], a < b, 



f(ab) 



if o = l, 
if a ±1. 
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Figure 7: For the above metric-tree weight function and it = (2 3 4), the equality of Lemma l30l does not hold. 



It can be shown that for the permutation ir = (2 3 4), d<p(n, e) = 4, while hD v (-K, e) = 3. 

The following lemma provides a two approximation for transposition distances based on extended-path weight functions. 
As the weighted Kendall distance is a special case of the weighted transposition distance with extended-path weight functions, 
the lemma also implies Prop. [T6l 

Lemma 31. For an extended-path weight function cp and for tt, a G S ra , 

^ -0^,(7!-, <t) < d v (ir, a) < D v (ir,a). 

Proof: The lower bound follows from Lemma |2~71 To prove the upper bound, consider a metric-path weight function <p' , 
with the same defining path as ip, such that 

f{a 6) = 2( P(a b) 

for any pair a, b adjacent in 9. From Lemma |25l it follows that for distinct c,de [n], 

d v ((c d),e) < 2p*(c,d) =p*,(c,d) = d v >((c d),e). 

Hence, 

d v (ir,cr) < d^(ir,<r) = -D^{-K,a) = D v (ir,a), 
which proves the claimed result. ■ 

V. Aggregation Algorithms 

Despite the importance of the rank aggregation problem in many areas of information retrieval, only a handful of results 
regarding the complexity of the problem are known. Among them, the most important results are the fact that finding a Kemeny 
optimal solution is NP-hard (see 0, ||23 I and references therein). Since the Kendall r distance is a special case of the weighted 
Kendall distance, finding the aggregate ranking for the latter is also NP-hard. In particular, exhaustive search approaches - 
akin to the one we used in the previous sections - are not computationally feasible for large problems. 

However, assuming that ir* is the solution to (|T), the ranking closest to 7r* provides a 2-approximation for the aggregate 
ranking. This easily follows from the fact that the Kendall r distance satisfies the triangle inequality. As a result, one only has 
to evaluate the pairwise distances of the votes S in order to identify a 2-approximation aggregate for the problem. Assuming 
the weighted Kendall distance can be computed efficiently (for example, if the weight function is monotonic), the same is true 
of the weighted Kendall distance as it is also a metric and thus satisfies the triangle inequality. 

A second method for obtaining a 2-approximation is an extension of a bipartite matching algorithm. For any distance function 
that may be written as 

n 

d( 7 r,a)=^/(^ 1 (fc),a- 1 (fc)) J (22) 

where / denotes an arbitrary non-negative function, one can find an exact solution to ([TJ as described in the next section. The 
matching algorithm approach for classical Kendall r aggregation was first proposed in [8|. 

A. Vote Aggregation Using Matching Algorithms 

Consider a complete weighted bipartite graph Q — (X, Y), with X — {1,2, ■ ■ ■ ,n} corresponding to the n ranks to be filled 
in, and Y = {1, 2, ■ • • , n} corresponding to the elements of [n], i.e., the candidates. Let (i, j) denote an edge between i£l 
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-D v (-K,a) < d v (7r,cr) < D v (n,a). 



and j E Y. We say that a perfect bipartite matching P corresponds to a permutation tt whenever E P if and only if 
7r(i) = j. If the weight of equals 

m 

2=1 

i.e., the weight incurred by ir(i) = j, the minimum weight perfect matching corresponds to a solution of (fTJ. The distance of 
d22l is a generalized version of Spearman's footrule since Spearman's footrule [7| can be obtained by choosing f(x, y) = \x—y\. 
Below, we explain how to use the matching approach for aggregation based on a general weighted Kendall distance. More 
details about this approach may be found in our companion conference paper lfl3l . 
Recall that for a weighted Kendall distance with weight function <p, 

n 

D v (7T,a) = ^w(n- 1 (i): a~\t)), 
i=l 

where 

i!2h=k'P(hh+l), ifk<l, 
Efe=i <P(hh+i), tik>i, 
0, if k = I. 

Note that D v is a distance measure of the form of ( l22l i. and thus a solution to problem (Q~|l for d = D v can be found exactly 
in polynomial time. 

Suppose that the set of votes is given by £ = {cri, • • • , a m }. 

Proposition 32. Let tt 1 = argiiiin^ YmLi Dip(tt, <Ji) and it* = argiiiin^ X)j=i Uj). The permutation n' is a 2-approximation 
to the optimal rank aggregate n* if tp corresponds to a weighted Kendall distance. 

Proof: From Prop. [16] for a weighted Kendall weight function ip and for permutations tt and a, 

1 

r 

Thus we have 

5^d v (7r / ,o-i)<5^D v (7r / ,o-i). 
i=i i=i 

and 

^D v (**, at) at) 
i=i i=i 

From the optimality of tt' with respect to D, we find 

m m 

5^i? v (7r , ) tr i )<53l>„(7r*,(r i ). 

1=1 !=1 

Hence 

m m 

^d^^fTi) < 2^d y (^*,C7 l )- 
(=1 (=1 

■ 

In fact, the above proposition applies to the larger class of weighted transposition distances with extended-path weights. 
It can similarly be shown that for a weighted transposition distance with general weights (resp. metric weights), tt' is a 4- 
approximation (resp. a 2-approximation). Finally, for a weighted transposition distance with metric-path weights, tt' represents 
the exact solution. 

A simple approach for improving the performance of the matching based algorithm is to couple it with a local descent 
method. Assume that an estimate of the aggregate at step I equals tt 1 ^ . As before, let A„ be the set of adjacent transpositions 
in §„. Then 

arg min ^ d(?r^ r, at). 

The search terminates when the cumulative distance of the aggregate from the set of votes E cannot be decreased further. We 
choose the starting point tt 1 - ' 1 to be the ranking tt' of Prop. [32l obtained by the minimum weight bipartite matching algorithm. 
This method will henceforth be referred to as Bipartite Matching with Local Search (BMLS). 

An important question at this point is how does the approximate nature of the BMLS aggregation process change the 
aggregate, especially with respect to the top-vs-bottom or similarity property? This question is hard, and we currently have 
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no mathematical results pertaining to this problem. Instead, we describe a number of simulation results that may guide future 
analysis of this issue. 

In order to see the effect of the BMLS on vote aggregation, we revisit Examples [1711201 In all except for one case the 
solution provided by BMLS is the same as the exact solution, both for the Kendall t and weighted Kendall distances. 

The exception is Example [T8l In this case, for the weight function <P(a+i) — (2/3) J_1 ,z G [3], the exact solution equals 
(1, 4, 2, 3) but the solution obtained via BMLS equals (4, 2, 3, 1). Note that these two solutions differ significantly in terms of 
their placement of candidate 1, ranked first in the exact ranking and last in the approximate ranking. The distances between the 
two solutions, d v ((l, 4, 2, 3), (4,2,3, 1)), equals 2.11 and is rather large. Nevertheless, the cumulative distances to the votes 
are very close in value: 

^d^((l,4,2,3),(j i ) = 9, 

i 

2dp((4,2,3,l),o-i) = 9.11. 

i 

Hence, as with any other distance based approach, the approximation result may sometimes diverge significantly from the 
optimum solution while the closeness of the approximate solution to the set of votes is nearly the same as that of the optimum 
solution. One way to avoid such approximation errors is to use weight functions with sufficiently large "spreads" of weights 
for which the difference between solutions has to be smaller than a given threshold. This topic will be discussed elsewhere. 



B. Vote Aggregation Using PageRank 

An algorithm for data fusion based on the PageRank and HITS algorithms for ranking web pages was proposed in (8), 
(23). PageRank is one of the most important algorithms developed for search engines used by Google, with the aim of scoring 
web-pages based on their relevance. Each webpage that has hyperlinks to other webpages is considered a voter, while the 
voter's preferences for candidates is expressed via the hyperlinks. When a hyperlink to a webpage is not present, it is assumed 
that the voter does not support the given candidate's webpage. Although the exact implementation details of PageRank are not 
known, it is widely assumed that the graph of webpages is endowed with near-uniform transition probabilities. The ranking 
of the webpages is obtained by computing the stationary probabilities of the chain, and ordering the pages according to the 
values of the stationary probabilities. The connectivity of the Markov chain provides information about pairwise candidate 
preferences, and states with high input probability correspond to candidates ranked highly in a large number of lists. 

This idea can be easily adapted to the rank aggregation problem with weighted distances in several different settings. In such 
an adaptation, the states of a Markov chain correspond to the candidates and the transition probabilities are functions of the 
votes. Dwork et al. [8|, |9| proposed four different ways for computing the transition probabilities from the votes. Below, we 
describe the method that is most suitable for our problem and provide a generalization of the algorithm for weighted distance 
aggregation. 

Consider a Markov chain with states indexed by the candidates. Let P denote the transition probability matrix of the Markov 
chain, with P,j denoting the probability of going from state (candidate) i to state j. In (8), the transition probabilities are 
evaluated as ^ 

where 

[h if*-\j)<a-i(i), 

[O, ifa- 1 (j)>a- 1 (i). 

Our Markov chain model for weighted Kendall distance is similar, with a modification that includes incorporating transpo- 
sition weights into the transition probabilities. To accomplish this task, we proceed as follows. 
Let Wk = tptkk+i)> an d l et — f° r candidate i G [n]. We set 

Pij(a)= max ^ h = l ™ h (23) 

l:ja<l<ia l a — I 



if ja < ia, Pij((r) = if j a > v, and 
The transition probabilities equal 



Pii(*)= ]T p ki {u). 

^ 771 
m ^ — ^ 



m 
k=l 
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Figure 8: The Markov chain for Example [33] 



with 

P ia) PjM 

lA } E fe /W 

Intuitively, the transition probabilities described above may be interpreted as follows. The transition probabilities are obtained 
by averaging the transition probabilities corresponding to individual votes a £ E. For each vote er, consider candidates j and k 
with j a = i„ — 1 and k a = i a — 2. The probability of going from candidate i to candidate j is proportional to Wj a = cprj^ it7 y 
This implies that if Wj a > 0, one moves from candidate i to candidate j with positive probability. Furthermore, larger values 
for Wj a result in higher probabilities for moving from i to j. 

In the case of candidate k, it seems reasonable to let the probability of transitioning from candidate i to candidate k be 
proportional to . However, since k is ranked before j by vote a, it is natural to require that the probability of moving 

to candidate k from candidate i be at least as high as the probability of moving to candidate j from candidate i. This reasoning 
leads to = max{wj a , w i° + ^ Wk <' } and motivates using the maximum in d23l ). Finally, the probability of staying with candidate 
i is proportional to the sum of the /3's from candidates placed below candidate i. 

Example 33. Let the votes in S consist of a\ = (a, b, c), 02 = (a, £>, c), and 03 = (6, c, a), and let id = (w\, W2) — (2, 1). 
Consider the vote <j\ = (a, b, c). We have /3f, a (01) = ^ = 2. Note that if w\ is large, then /3i, a is large as well. 
In addition, (ai ) = ^ = 1 and 

f wi + w 2 1 3 
p ca = max ,{j cb } = -. 



The purpose of the max function is to ensure that f3 ca > f3 c i>, which is a natural requirement given that a is ranked before b 
according to ax. 

Finally, f3 aa (<7i) = (3 ca (<ti) + fiba (fi) = 2 + | = | and /?&(, (cri) = /3 c f, (ai) — 1. Note that according to the transition 
probability model, one also has (3 aa > (5^. This may again be justified by the fact that a\ places a higher than b. 
Since a\ = o"2, we have 




Similar computations yield 



and thus 



P(a 1 ) = P(cr 2 ) 



3 

/3 C &(cr 3 ) = 2, /3 ac (CT3) = 1, /3ab(CT3)=2 

/?aa(CT 3 )=0, ^bfe ((T 3 ) = 2 + § = f, /3 CC ((T3) = 1, 



From the P (<7i) , P ((T2) > and P (C3), we obtain 
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Method 


Aggregate ranking and average distance 


w = (1,0,0,0) 


w = (1, 1, 1, 1) 


w = (1,1,0,0) 


w = (0,1,0,0) 


OPT 


(1,4,3,2,5), 0.7273 


(2,3,4,5,1), 2.3636 




'2,3,4,5,l' 


, 1.455 




'3,2,5,4, l), 0.636 


BMLS 


(1, 2, 3, 4, 5), 0.7273 


(2,3,4,5,1), 2.3636 




2,3,1,5,4 


, 1.455 




2,3,1,5,4V 0.636 


MC 


(1, 2, 5, 4, 3), 0.7273 


(2,3,4,5, 1) , 2.3636 




2,1,3,4,5 


, 1.546 




2,3,1,4,5], 0.636 



Table I: The aggregate rankings and the average distance of the aggregate ranking from the votes for different weight functions 
w. 



P 



P Oi) + P (<7 2 ) + P (a 3 ) 




The Markov chain corresponding to P is given in Figure [8] The stationary distribution of this Markov chain is (0.56657, 
0.34844, 0.084986) which corresponds to the ranking (a, 6, c). 

Example 34. The performance of the Markov chain approach described above cannot be easily evaluated analytically, as is 
the case with any related aggregation algorithm proposed so far. 

We hence test the performance of the scheme on examples for which the optimal solutions are easy to evaluate numerically. 
For this purpose, in what follows, we consider a simple test example, with m = 11. The set of votes (rankings) is listed below 
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4 


4 
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5 


5 


3 


3 


4 


4 


V 5 


5 


5 


1 


1 


1 


1 


1 


1 


1 


1 / 



Note that due to the transpose operator, each column corresponds to a vote, e.g., o\ = (1, 2, 3, 4, 5). 

Let us consider candidates 1 and 2. Using the majority rule, one would arrive at the conclusion that candidate 1 should 
be the winner, given that 1 appears most often at the top of the list. Under a number of other aggregation rules, including 
Kemeny's rule and Borda's method, candidate 2 would be the winner. 



Our immediate goal is to see how different weighted distance based rank aggregation algorithms would position candidates 
1 and 2. The numerical results regarding this example are presented in Table [I] In the table, OPT refers to an optimal solution 
which was found by exhaustive search, and MC refers to the Markov chain method. 

If the weight function is w = (wi, • • • , wa) — (1, 0, 0, 0), where Wi = (fru+i), the optimal aggregate vote clearly corresponds 
to the plurality winner. That is, the winner is the candidate with most voters ranking him/her as the top candidate. A quick 
check of Table U reveals that all three methods identify the winner correctly. Note that the ranks of candidates other than 
candidate 1 obtained by the different methods are different, however this does not affect the distance between the aggregate 
ranking and the votes. 

The next weight function that we consider is the uniform weight function, w = (1, 1, 1, 1). This weight function corresponds 
to the conventional Kendall r distance. As shown in Table H] all three methods produce (2,3,4,5,1), and the aggregates 
returned by BMLS and MC are optimum. 

The weight function w = (1, 1, 0, 0) corresponds to ranking of the top 2 candidates. OPT and BMLS return 2, 3 as the top 
two candidates, both preferring 2 to 3. The MC method, however, returns 2, 1 as the top two candidates, with a preference 
for 2 over 1, and a suboptimal cumulative distance. It should be noted that this may be attributed to the fact the the MC 
method is not designed to only minimize the average distance: another important factor in determining the winners via the 
MC method is that winning against strong candidates "makes one strong". In this example, candidate 1 beats the strongest 
candidate, candidate 2, three times, while candidate 3 beats candidate 2 only twice and this fact seems to be the reason for the 
MC algorithm to prefer candidate 1 to candidate 3. Nevertheless, the stationary probabilities of candidates 1 and 3 obtained 
by the MC method are very close to each other, as the vector of probabilities is ( 0.137 , 0.555, 0.132 , 0.0883, 0.0877). 

The weight function w = (0, 1, 0, 0) corresponds to identifying the top 2 candidates - i.e., it is not important which candidate 
is the first and which is the second. The OPT and BMLS identify {2, 3} as the top two candidates. 

The MC method returns the stationary probabilities (0, 1,0,0,0) which means that candidate 2 is an absorbing state in the 
Markov chain. This occurs because candidate 2 is ranked first or second by all voters. The existence of absorbing states is a 
drawback of the Markov chain methods. One solution is to remove 2 from the votes and re-apply MC. The MC method in this 
case results in the stationary distribution (p (1) ,p (3) ,p (4) ,p (5)) = (0.273, 0.364, 0.182, 0.182) , which gives us the ranking 
(3, 1, 4, 5). Together with the fact that candidate 2 is the strongest candidate, we obtain the ranking (2, 3, 1, 4, 5). 
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VI. Appendix 

A. Computing the Weight Functions with Two Identical Non-zero Weights 

The goal is to find the weighted Kendall distance d ¥ ,(7r, e) with the weight function of ( fTSl i. for an arbitrary tt £ S n . For 
this purpose, let R\ = {1, • • ■ , a}, i? 2 = {a + 1, • • • ,b}, and R 3 = {b + 1, • ■ • , n}, and define 

A£ = |{fc G : 7r _1 (Ar) G i,j G {1,2,3}. 

That is, A^ is the number of elements whose ranks in tt belong to the set R4 and whose ranks in e belong to the set Rj. 
A sequence of transpositions that transforms tt into e moves the iVy elements of {k G i?j : 7r _1 (fc) G from to 
Furthermore, note that any transposition that swaps two elements with ranks in the same region R^i e [3], has weight zero, 
while for any transposition 77 that swaps an element ranked in R\ with an element ranked in R2 or swaps an element ranked 
in R2 with an element ranked in R 3 , we have d v (ri, e) = 1. 

It is straightforward to see that £\ Nfi = £\ N ji- In particular, iVf 2 + iVf 3 = A^ + A^ and A^ + A^ 2 = iVf 3 + N£ 3 . 

We show next that 

2 JVf 3 + iVf 2 + iVfa , if > 1 or 2V& > 1 , 
2JV& + 1, if AT- =^ = 0. 



d y (7r,e) 

Note that, from Prop. [16] we have 



d v (n, e)>-D v (tt, e) = 2N& + A^ 2 + . (24) 

Suppose that A^i > 1 or A^ > 1. We find a transposition 77, with e) = 1, such that 7r' = 7rr; satisfies Dm(tt', e) = 

D v (ir, e) — 2, and at least one of the following conditions: 

or 

> 1, (25) 

or 

tt' = e. 

Applying the same argument repeatedly, and using the triangle inequality, proves that d p (7r, e) < ^D v (w, e) if AT^i > 1 or 
A r 2 3 > 1. This, along with (|24j, shows that d v {rr, e) = I-D^tt, e) if A^ > 1 or N% 3 > 1. 

First, suppose that ATJj > 1 and N£ 3 > 1. It then follows that N? 2 > 1 or N 32 > 1. Without loss of generality, assume that 
N$2 > 1. Then 77 can be chosen such that = ^12 ~ 1 and N 2i = N 2i ~ 1 - We have D v{ n ' > e ) = A^tt, e ) _ 2 > and since 
N23 > 1, condition d25j holds. 

Next, suppose > 1 and N£ 3 = 0. If ATf 3 > 1, choose 77 such that 

N£ = JV£ - 1, 

^23 = 1 = 

iv 13 _ jv 13 1. 

where tt' = ttt/. Since A/J 3 ' = 1, condition (|25j is satisfied. If ATf 3 = 0, then = A^ = 0, and thus ATf 2 = ATJi > 1. In 
this case, we choose 77 such that AT 21 = A^f 2 = AT^ — 1. As a result, we have either > 1 or tt' = e. Hence, condition 
d25T > is satisfied once again. Note that in both cases, for N* 3 = as well as for N™ 3 > 1, we have D v (tt', e) = D v (tt : e) — 2. 
The proof for the case N$ 3 > 1 and A 7 ^ = follows along similar lines. 

If N21 — N% 3 — 0, it can be verified by inspection that for every transposition t; with d v (ri, e) = 1, we have D^iirri, e) > 
D v (TT,e). Hence, the inequality in (l24l cannot be satisfied with equality, which implies that d v (7r, e) > 2N^ 3 + 1. Choose a 
transposition 77 with d v (Ti,e) = 1 such that 

N13 = N? 3 - 1, 



^3 = 1- 



where tt' = 7tt;. We have 



cI^Itt, e) < d v (7i,e) + d^n' , e) = 1 + 2A^ 3 . 
This, along with d v (ir, e) > 2A^f 3 + 1, completes the proof. 
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B. The Average Kendall and Weighted Kendall Distance 

The Kendall r distance between two rankings may be viewed in the following way: each pair of candidates on which the 
two rankings disagree contribute one unit to the distance between the rankings. Owing to Algorithm Q] the weighted Kendall 
distance with a decreasing weight function can be regarded in a similar manner: each pair of candidates on which the two 
rankings disagree contributes <pr s s +i), for some s, to the distance between the rankings. 

Consider a pair a and b such that 7r _1 (6) < 7T _1 (a) and cr _1 (a) < cr _1 (6). In Algorithm Q] there exists a transposition 
t* = (s s + 1) that swaps a and b where 

.s = 7r- 1 (6) + |{fc:(j- 1 (fc) <a" 1 (a) ) 7r- 1 (ife) >7r -1 (&)}|, 

that is, s equals 7r _1 (6) plus the number of elements that appear before a in a and after b in tt. It is not hard to see that s 
can also be written in a way that is symmetric with respect to tt and <j, as 

s = Tr^ity+a-^a) -\{k: 7r -1 (Jfc) < tT 1 ^), a-^O) < cr _1 (a)}| - 1 
= n- 1- \{k : 7r -1 (fc) > 7r -1 (&),o- -1 (fc) > o- -1 (a)}| . 
As an example, consider = n — i. Then, 

dp(7T,o-)= J2 (l + \{k:n- 1 (k)>ir- 1 (b),<7- 1 (k)>a- 1 (a)}\) 

(6,a)e^(7r,<7) 

= if(7T,cr)+ E ({fclTT-^fc) >7T- 1 (6),CT- 1 (fc) >0- _1 (a)}| 
(6,o)e^(ir,c7) 

where ^(7r, ct) is the set of ordered pairs (b, a) such that 7r _1 (6) < 7r _1 (a) and <r _1 (a) < cr _1 (6). Note that the weighted 
Kendall distance d v equals the Kendall r distance plus a sum that captures the influence of assigning higher importance to 
the top positions of the rankings. 

These observations allow us to easily compute the expected value of the distance between the identity permutation and a 
randomly and uniformly chosen permutation tt G §„. For 1 < a < b < n and s G [n — 1], let X^ b be an indicator variable that 
equals one if and only if 7r _1 (a) > 7r _1 (6) and 

\{k > a : 7r _1 (fc) > 7r _1 (6)}| = n — 1 — s. 

The expected distance between the two permutations equals 

n — 1 n— 1 n 

£[d>, e )] = e p (a s+1) E E ^ • < 26 > 

s — 1 a— 1 b— a+1 

By the definition of X* b , E [X* b ] equals the probability of the event that n — 1 — s elements of {a + 1, • • • , n}\{b} and a 
appear after b in tt. There are ( n Z a Zi) ways to choose n — s — 1 elements from {a + 1, • • • ,n}\{6}, ( — 1)! ways to 

assign positions to the elements of {1, 2, • • • , a — 1}, (s — a)! ways to arrange the s — a elements of {a+1, - ■ ■ , n}\{b} that 
appear before b, and (n— s)\ ways to arrange a and the n — 1 — s elements of {a + 1, • • • , that appear after b. Hence, 

E K b ] = lf n - a -])f n )(a 1)!( S - a)\{n - s)\ 
n\\n — s — \J \a — XJ 

n — s 
(n — a + l)(n — a) ' 

for 1 < a < s, and £ [X" b ] — for a > s. Using this expression in (|26| |. we obtain 

n— 1 s 
E[d v (e, tt)} = V Lp [s s+1) V — 

s— 1 a— 1 

n-l 

= E ~ S )( H n ~ Hn-s), 

s=l 

where iJ; = t- Indeed, for ^( ss +i) = 1, s e [n — 1], we recover the well known result that 

n-l 

£?[d„(e,7r)] = E( n - S X#« - H n-.) 

s=l 
n-l 

= E - H k ) 

k=l 

■KID- 
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For (f( s s+1 ) = n — s, the average distance equals 

n-l 

s=l 
n-l 

= ^ k 2 (H n - H k ) 

k=l 

_ 1 /n\ 2 /n^ 
~ 2\2J + 3U 
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